An empirical study toward dealing with noise and class imbalance issues in software defect prediction
نویسندگان
چکیده
The quality of the defect datasets is a critical issue in domain software prediction (SDP). These are obtained through mining repositories. Recent studies claim over dataset. It because inconsistency between bug/clean fix keyword fault reports and corresponding link change management logs. Class Imbalance (CI) problem also big challenging SDP models. method trained using noisy imbalanced data leads to inconsistent unsatisfactory results. Combined analysis instances CI needs be required. To best our knowledge, there insufficient that have been done such aspects. In this paper, we deal with impact noise on five baseline models; manually added various level (0–80%) identified its performance those Moreover, further provide guidelines for possible range tolerable We suggested model, which has highest ability outperforms other classical methods. True Positive Rate (TPR) False (FPR) values models reduce 20–30% after adding 10–40% instances. Similarly, ROC (Receiver Operating Characteristics) 40–50%. model avoid 40–60% as compared traditional
منابع مشابه
Dealing with Class Imbalance using Thresholding
We propose thresholding as an approach to deal with class imbalance. We define the concept of thresholding as a process of determining a decision boundary in the presence of a tunable parameter. The threshold is the maximum value of this tunable parameter where the conditions of a certain decision are satisfied. We show that thresholding is applicable not only for linear classifiers but also fo...
متن کاملAn empirical study on software defect prediction with a simplified metric set
Context: Software defect prediction plays a crucial role in estimating the most defect-prone components of software, and a large number of studies have pursued improving prediction accuracy within a project or across projects. However, the rules for making an appropriate decision between withinand cross-project defect prediction when available historical data are insufficient remain unclear. Ob...
متن کاملUsing Class Imbalance Learning for Cross-Company Defect Prediction
Cross-company defect prediction (CCDP) is a practical way that trains a prediction model by exploiting one or multiple projects of a source company and then applies the model to target company. Unfortunately, the performance of such CCDP models is susceptible to the high imbalanced nature between the defect-prone and non-defect classes of CC data. Class imbalance learning is applied to alleviat...
متن کاملDealing with Multiple Classes in Online Class Imbalance Learning
Online class imbalance learning deals with data streams having very skewed class distributions in a timely fashion. Although a few methods have been proposed to handle such problems, most of them focus on two-class cases. Multi-class imbalance imposes additional challenges in learning. This paper studies the combined challenges posed by multiclass imbalance and online learning, and aims at a mo...
متن کاملDealing with software design issues using an Agent-Oriented methodology
"!# $% & !# ' %() !#* + ' ' , .-" # /'0 1 2 ' % 3 * 4! 1" 4 5 *6 7 !8 9 ' # : # % .-" #!# 5 "19 1" # " ; &9< ' #*6 ; 4 = # " > ' " 7 4!? # " ' # ! @ + A0 1" !2( 2 CB+D=E" 4 = # " F 7 4 " ' G 9 ' # . H ' IGJ &"1 F K 4 L M ' '19 N /2 ' ( 19 ' , O . 4* '!L 4 " "!# # J 5 J 'G9 J 4! !# O '* !#1" 4 # " P " LQ # + ' 4 7 4 # " R 9 #S? / ' + A # F #&"1 ' T FB9 F ' U ' 7-" !# 'GV 1 /4 V H (W () H ' 8 4 L...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Soft Computing
سال: 2021
ISSN: ['1433-7479', '1432-7643']
DOI: https://doi.org/10.1007/s00500-021-06096-3